Methods for human evaluation of machine translation

نویسندگان

  • Sofia Bremin
  • Hongzhan Hu
  • Johanna Karlsson
  • Anna Prytz Lillkull
  • Martin Wester
  • Henrik Danielsson
  • Sara Stymne
چکیده

Evaluation of machine translation (MT) is a difficult task, both for humans, and using automatic metrics. The main difficulty lies in the fact that there is not one single correct translation, but many alternative good translation options. MT systems are often evaluated using automatic metrics, which commonly rely on comparing a translation to only a single human reference translation. An alternative is different types of human evaluations, commonly ranking between systems or estimations of adequacy and fluency on some scale, or error analyses. We have explored four different evaluation methods on output from three different statistical MT systems. The main focus is on different types of human evaluation. We compare two conventional evaluation methods, human error analysis and automatic metrics, to two lesser used evaluation methods based on reading comprehension and eyetracking. These two methods of evaluations are performed without the subjects seeing the source sentence. There have been few previous attempts of using reading comprehension and eye-tracking for MT evaluation. One example of a reading comprehension study is Fuji (1999) who conducted an experiment to compare Englishto-Japanese MT to several versions of manual corrections of the system output. He found significant differences between texts with large differences on reading comprehension questions. Doherty and O’Brien (2009) is the only study we are aware of using eye-tracking for MT evaluation. They found that the average gaze time and fixation counts were significantly lower for sentences judged as excellent in an earlier evaluation, than for bad sentences. Like previous research we find that both reading comprehension and eye-tracking can be useful for MT evaluation. The results of these methods are consistent with the other methods for comparison between systems with a big quality difference. For systems with similar quality, however, the different evaluation methods often does not show any significant differences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

A Comparative Study of English-Persian Translation of Neural Google Translation

Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...

متن کامل

بهبود و توسعه یک سیستم مترجم‌یار انگلیسی به فارسی

In recent years, significant improvements have been achieved in statistical machine translation (SMT), but still even the best machine translation technology is far from replacing or even competing with human translators. Another way to increase the productivity of the translation process is computer-assisted translation (CAT) system. In a CAT system, the human translator begins to type the tra...

متن کامل

Survey of Machine Translation Evaluation

The evaluation of machine translation (MT) systems is an important and active research area. Many methods have been proposed to determine and optimize the output quality of MT systems. Because of the complexity of natural languages, it is not easy to find optimal evaluating methods. The early methods are based on human judgements. They are reliable but expensive, i.e. time-consuming and non-reu...

متن کامل

Linguistic-based Evaluation Criteria to identify Statistical Machine Translation Errors

Machine translation evaluation methods are highly necessary in order to analyze the performance of translation systems. Up to now, the most traditional methods are the use of automatic measures such as BLEU or the quality perception performed by native human evaluations. In order to complement these traditional procedures, the current paper presents a new human evaluation based on the expert kn...

متن کامل

Modern MT Systems and the Myth of Human Translation: Real World Status Quo

This paper objects to the current consensus that machine translation (MT) systems are generally inferior to human translation (HT) in terms of translation quality. In our opinion, this belief is erroneous for many reasons, the both most important being a lack of formalism in comparison methods and a certain supineness to recover from past experience. As a side effect, this paper will provide ev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011